Learning with structured data: applications to computer vision
نویسنده
چکیده
In this thesis we address structured machine learning problems. Here “structured” refers to situations in which the input or output domain of a prediction function is non-vectorial. Instead, the input instance or the predicted value can be decomposed into parts that follow certain dependencies, relations and constraints. Throughout the thesis we will use hard computer vision tasks as a rich source of structured machine learning problems. In the first part of the thesis we consider structure in the input domain. We develop a general framework based on the notion of substructures. The framework is broadly applicable and we show how to cast two computer vision problems — class-level object recognition and human action recognition — in terms of classifying structured input data. For the class-level object recognition problem we model images as labeled graphs that encode local appearance statistics at vertices and pairwise geometric relations at edges. Recognizing an object can then be posed within our substructure framework as finding discriminative matching subgraphs. For the recognition of human actions we apply a similar principle in that we model a video as a sequence of local motion information. Recognizing an action then becomes recognizing a matching subsequence within the larger video sequence. For both applications, our framework enables us to finding the discriminative substructures from training data. This first part contains as a main contribution a set of abstract algorithms for our framework to enable the construction of powerful classifiers for a large family of structured input domains. The second part of the thesis addresses structure in the output domain of a prediction function. Specifically we consider image segmentation problems in which the produced segmentation must satisfy global properties such as connectivity. We develop a principled method to incorporate global interactions into computer vision random field models by means of linear programming relaxations. To further understand solutions produced by general linear programming relaxations we develop a tractable and novel concept of solution stability, where stability is quantified with respect to perturbations of the input data. This second part of the thesis makes progress in modeling, solving and understanding solution properties of hard structured prediction problems arising in computer vision. In particular, we show how previously intractable models integrating global constraints with local evidence can be well approximated. We further show how these solutions can be understood in light of their stability properties.
منابع مشابه
Computer assisted instruction during quarantine and computer vision syndrome
Computer vision syndrome (CVS) is a set of visual, ocular, and musculoskeletal symptoms that result from long-term computer use. These symptoms include eyestrain, dry eyes, burning, pain, redness, blurred vision, etc, which increase with the duration of computer use. Currently, with the closure of schools and universities due to the continued COVID19 pandemic many universities have taken the pr...
متن کاملSpeedMachines: Anytime Structured Prediction
Structured prediction plays a central role in machine learning applications from computational biology to computer vision. These models require significantly more computation than unstructured models, and, in many applications, algorithms may need to make predictions within a computational budget or in an anytime fashion. In this work we propose an anytime technique for learning structured pred...
متن کاملHuman Computer Interaction Using Vision-Based Hand Gesture Recognition
With the rapid emergence of 3D applications and virtual environments in computer systems; the need for a new type of interaction device arises. This is because the traditional devices such as mouse, keyboard, and joystick become inefficient and cumbersome within these virtual environments. In other words, evolution of user interfaces shapes the change in the Human-Computer Interaction (HCI). In...
متن کاملHuman Computer Interaction Using Vision-Based Hand Gesture Recognition
With the rapid emergence of 3D applications and virtual environments in computer systems; the need for a new type of interaction device arises. This is because the traditional devices such as mouse, keyboard, and joystick become inefficient and cumbersome within these virtual environments. In other words, evolution of user interfaces shapes the change in the Human-Computer Interaction (HCI). In...
متن کاملOn Graph-Structured Discrete Labelling Problems in Computer Vision: Learning, Inference and Applications
A number of problems in computer vision (e.g., image segmentation, gender classification of faces, etc) can be formulated as graph-structured discrete labelling problems, where the goal is to predict labels (e.g. foreground/background, male/female) for a set of variables (e.g. pixels, faces in an image, etc) that have some known underlying structure (e.g., neighbouring pixels in an image often ...
متن کامل